An assessment of the construct validity of the Child Health Utility 9D-CHN instrument in school-aged children: evidence from a Chinese trial

Background Although there is emerging data regarding the psychometric properties of the Child Health Utility-9D instrument, more evidence is required with respect to its validity for use in different country settings. The aim of this study was to examine the construct validity of the CHU-9D-CHN instrument in Chinese children. Methods Baseline Health-Related Quality of Life (HRQoL) and demographic data were collected from children recruited to the CHIRPY DRAGON obesity prevention intervention randomised controlled trial in China. HRQoL was measured using the Chinese version of the CHU-9D instrument (CHU-9D-CHN) and the PedsQL instrument. CHU-9D-CHN utility scores were generated using two scoring algorithms [UK and Chinese tariffs]. Discriminant validity, known-group validity and convergent validity were evaluated using non-parametric test for trend, Kruskal–Wallis test and Spearman correlation coefficient analysis respectively. Results Data was available for 1,539 children (mean age 6 years). The CHU-9D-CHN was sensitive to known group differences determined by the median PedsQL total score. Furthermore, the mean CHU-9D-CHN utility values decreased linearly with increasing levels of severity on each dimension of the PedsQL for emotional and social functioning domains. They decreased monotonically with increasing levels of severity on each dimension of the PedsQL for physical and school functioning domains (p < 0.001). Contrary to studies conducted in Western countries, and although not statistically significant, we found an indication that HRQoL, using both the CHU-9D-CHN and the PedsQL, was higher in children whose parents had lower levels of education, compared to those whose parents were university educated. The correlation between the CHU-9D-CHN utility values using UK and Chinese tariffs, and PedsQL total scores showed a statistically significant moderate positive correlation (Spearman’s rho = 0.5221, p < 0.001 and Spearman’s rho = 0.5316, p < 0.001), respectively. However, each CHU-9D-CHN dimension was either weakly, or very weakly correlated with each of the predetermined PedsQL domain functioning scores. Conclusions Overall, the findings provide some support for the construct validity of the CHU-9D-CHN within a Chinese population aged 6–7 years. However, some uncertainty remains. We recommend future studies continue to test the validity of the CHU-9D in different country settings. Trial registration: ISRCTN Identifier ISRCTN11867516, Registered on 19/08/2015 Supplementary Information The online version contains supplementary material available at 10.1186/s12955-021-01840-7.


Background
Obesity prevention interventions have increasingly targeted primary school-aged children [1]. This has implications for the methods of outcome measurement within economic evaluation of these interventions as few instruments exist which are designed to generate utilities, for the construction of Quality-Adjusted Life Years (QALYs), in this age group [1]. Assessment of health status in children is unlike adults and requires a different conceptual approach. This is because of rapid rates of development in children, dependency on parents/ caregivers and differences in disease epidemiology [2]. The assessment of each individual's health related quality of life (HRQoL) relies on their subjective evaluation of functioning in different domains. It has been suggested that children's subjective health reports are not reliable and are therefore of limited use [3]. However, research demonstrates that primary school-age children aged 8-10 years [4], and perhaps even younger [5], can adequately reflect and report their health state provided the instruments use appropriate language and the constructs are relevant to the age group. HRQoL instruments may either be self-administered or interviewer-administered by parents, caregivers or researchers. As the cognitive and language skills of young children are not completely developed, it is necessary to use interviewers to help with reading out the questions for the assessment of HRQoL in this age group.
Ideally, utility-based health-related quality of life in children should be measured using an instrument specifically designed for them [6]. Although there is no gold standard for measuring utility-based HRQoL in primary school-aged children, previous research has shown that the Child Health Utility-9D (CHU-9D) is an appropriate choice [7]. It is a preference-based instrument that generates utility values anchored between the values of 0 (being dead) and 1 (perfect health), with negative values denoting states worse than being dead. It is a generic instrument, not specific to any one condition or disease, and designed for application in economic evaluation of prevention, treatment and service programmes targeted at young people where the QALY is the desired outcome measure [8]. Although it has been used in populations with a wide age range (from 6 to 17 years) [9,10], it was originally developed and validated for children aged 7-11 years in the UK [11,12]. More recently its construct validity was demonstrated in [11][12][13][14][15][16][17] year olds in Australia [13] and Denmark [14].
The Paediatric Quality of Life Inventory TM (PedsQL) is a widely used HRQoL instrument validated for use with young children over 5 years old in diverse populations [15,16]. It has good reliability and validity in both paediatric patients and healthy populations [15,16]. The PedsQL is currently a non-preference based instrument which does not apply any explicit weighting between item domains and therefore cannot be used to generate utility values for the construction of QALYs. However, it would be expected to produce HRQoL values which move in the same direction as the utility values.
A UK study in children aged 5-6 years [9], an Australian study in children aged 11-17 years [13], and a Danish study in high-school students [14], found evidence of lower HRQoL in children from a lower socio-economic background. These studies, including a study from China found that there was a strong or moderate positive correlation between the CHU-9D utility values and PedsQL total scores [9,13,14,17]. Although there is emerging evidence regarding the psychometric properties of the CHU-9D instrument [9,13,14], there is a dearth of instruments available for assessing HRQoL among Chinese children and more evidence is required on the CHU-9D before widespread use in China and in other settings with a large number of Chinese migrants such as Malaysia and Singapore. This is important because the measure may have different construct validity in different populations which might affect the results of health economic evaluations.
The aim of this study was therefore to assess the construct validity of the CHU-9D-CHN instrument in 6-7 year-old children in a Chinese setting, with the objectives being: • To assess the known-group validity, referring to the principle that the CHU-9D-CHN should be able to demonstrate different scores for groups of children who are known to vary on HRQoL (e.g. socio-economic status [9,13,14]). • To determine the discriminant or divergent validity of the instrument by exploring how the different dimensions of HRQoL that are theoretically not supposed to be related are actually related. • To determine the convergent validity of the instrument, referring to the degree to which the CHU-9D-CHN and PedsQL capture a common construct of HRQoL [18]. Zanganeh et al. Health Qual Life Outcomes (2021) 19:205 Methods

Trial design and participants
The analysis presented uses data from the CHIRPY DRAGON cluster-randomised controlled trial assessing effectiveness and cost-effectiveness of a childhood obesity prevention intervention in Guangzhou, China [19,20]. Children took part in baseline measurements in 2015 when they were 6-7 years old, and were followed up for 12 months. At baseline, a range of measurements were undertaken, including HRQoL measured using the PedsQL and CHU-9D-CHN; height; weight; gender; age (in months); and socio-economic factors. This study used the complete baseline data for 1,539 children to assess the CHU-9D-CHN in relation to the PedsQL.
All year-one students from non-boarding, state-funded (residents) primary schools/clusters (n = 353) located in the largest Southern Chinese city, Guangzhou were eligible for inclusion. The majority of Chinese children attend this type of school [21,22]. A few private schools, mainly for children of foreign residents [21,22], were not eligible. The trial study team randomly selected 40 schools using a random number generator and obtained permission to recruit from each school's principal. Informed consent was then sought for each child participant from their parents/guardians. The sample size (1640 children) was based on being able to detect a difference of 0.17 units in the mean BMI z scores between arms in a cluster of 40 schools, with 80% power and at a 5% significance level.
All outcomes were collected at the individual level by independent and trained assessors (research staff ) using standardised procedures and instruments. Data on participants' date of birth and gender were obtained from school records.

Anthropometric measurements
Height and weight measurements were undertaken without shoes and in light clothing. Standing height was measured at least twice with a TGZ-type height tester (Dalian). Weight was measured with an electronic scale (JH-1993 T, weighing Apparatus Co. Ltd., Dalian, China). Body mass index (BMI) was calculated as weight in kilograms divided by the square of height in metres (kg/m2). The WHO 2007 Growth charts were used to calculate BMI z-scores and to categorise the children into underweight, healthy weight, overweight and obese groups [23].

Measurement of HRQoL
The Chinese version of the CHU-9D (CHU9D-CHN) [24] and PedsQL, which are both generic instruments, were chosen for the measurement of HRQoL. Both instruments were researcher-administered considering the young age of the participants.
The CHU-9D-CHN instrument combines nine dimensions of HRQoL: worried; sad; pain; tired; annoyed; schoolwork/homework; sleep; daily routine; and ability to join in activities [11,25] (Additional file 1: Appendix 1). Each dimension comprises five severity levels, resulting in 1,953,125 unique health states associated with the measure. Individual responses from the questionnaires were transformed into utility weights derived from a UK general population sample using an algorithm developed by Stevens et al. [11,25]. This presents a possible utility value set of between 0.33 (worst health state) and 1 (best health state). The CHU-9D-CHN instrument has a Chinese tariff set available for estimating utility values, but according to the instrument developers [personal communication], at the time of this study, the Chinese-specific preference weights were still in development and required further validation therefore it was recommended to use the UK tariff set, and to use the Chinese-tariff set as an exploratory analysis [26]. The Chinese-tariff set that was used was obtained using utility weights derived from a Chinese student population (mean age 13 years) presenting a possible utility value set of between − 0.09 (worst health state) and 1 (best health state) [26].
The PedsQL is a 23-item instrument comprising four domains: physical (8 items), emotional (5 items), social (5 items), and school (5 items) functioning [15]. Each item has five response options: never; hardly ever; sometimes; often; almost always. Emerging from the instrument is a score (transformed on to a 0-100 scale) for each domain and a score for total HRQoL. Decreasing value of the score indicates poorer HRQoL. For this study the validated Chinese version of the PedsQL 4.0 instrument was used [27]. The mean score for each of the four domains was calculated by summing the values for the relevant items and dividing by the number of items answered. This process generated a mean for the total score (mean of all items), for the physical health score (mean of physical functioning items) and for the psychosocial health score (mean of emotional, social and school functioning items).

Known-group validity
The factors associated with HRQoL were explored. The relationship between HRQoL and weight status category (defined as either 'overweight/obese vs. healthy/underweight' or 'underweight vs. healthy weight, overweight and obese'); and with gender were examined. HRQoL was assessed in relation to socio-economic status (SES) using the parent's education level coded as a binary variable (did; did not obtain a university degree) and a categorical variable (school education; college vocational education; university undergraduate education; university postgraduate education). Mother/father's education level was collected through a parent completed questionnaire at baseline and was the pre-specified proxy measure of SES in the primary analysis. Mother/father's employment status was used as an alternative measure of SES as part of a sensitivity analysis. This was coded as a binary variable (did; did not work) and a categorical variable (working full time; working part time; unemployed or looking for work; looking after the family/house; other). Differences in HRQoL scores between groups were assessed using either the Kruskal-Wallis test (across all levels of categorical variables), or the non-parametric test for trend (across ordered categories of a variable). Nonparametric tests were used because the HRQoL variables did not follow a normal distribution (based on Kolmogorov-Smirnov test).
Statistical tests of difference were used to determine if the CHU-9D-CHN instrument was sensitive to identifying different scores between groups with known differences. The hypothesis was that studies from UK, Australian, and Danish settings reported a poorer HRQoL for children from lower socio-economic backgrounds [9,13,14], therefore we used SES for this analysis. Furthermore, the sample was split according to the median PedsQL total score. The mean (SD) CHU-9D-CHN utility values (using the UK and Chinese tariffs) were compared for children who had a score either on/ above, or below, this median PedsQL score, using the t-test.

Discriminant validity
To assess the discriminant validity, we examined how well the mean CHU-9D-CHN utility values corresponded with the options for each of the PedsQL dimensions, and for this, the mean CHU-9D-CHN utility value was estimated for each level of PedsQL response on every dimension. The hypothesis was that the mean CHU-9D-CHN utility values would decrease linearly or monotonically with increasing severity on each of the PedsQL dimensions.

Convergent validity
Convergent validity was explored, using statistical tests of association, to determine how the CHU-9D-CHN correlated with the PedsQL measure. Graphical means (scatter plots), along with fitted regression line and 95% CIs, for the CHU-9D-CHN utility values and the PedsQL total scores were used to show the relationship between the instruments. Then, using the Spearman's rho statistic, the correlation coefficient between the CHU-9D-CHN utility values and the PedsQL total scores was calculated. The hypothesis was that there would be a strong or moderate positive correlation between the CHU-9D-CHN utility values and PedsQL total scores [9,13,14].
Spearman's Rank correlation coefficient Rs is a technique which can be used to summarise the strength and direction (negative or positive) of a relationship between two instruments. The result is always between 1 and − 1. The meaning of the strength of the correlation using the guide for the value of Rs [28] is: 0.00-0.19: a very weak correlation; 0.20-0.39: a weak correlation; 0.40-0.69: a moderate correlation; 0.70-0.89: a strong correlation; 0.90-1.00: a very strong correlation.
The content and coverage of the two instruments were further examined by assessing the correlation between individual CHU-9D-CHN dimensions and the PedsQL domains that were conceptually similar, as follows: • Physical functioning: pain, tired, sleep, daily routine • Emotional functioning: worried, sad, annoyed • Social functioning: ability to join in activities • School functioning: school work/home work All statistical analyses were undertaken in 2019, using Stata version 13.

Ethics
The study was funded through a philanthropic donation

Participant characteristics
Complete data (including PedsQL total score and its sub-scales; CHU-9D-CHN dimensions and utility value; height and weight (converted to BMI z-score and weight status); gender; age; and parents' education level) were available for 1539 out of 1640 children (93.8% of those who consented and participated in study measurements) and are described in Table 1.
The mean age of the children was 6.6 years (SD = 0.42) and 54% were male. Around a third of parents were educated to below university degree. The mean BMI z-score was -0.12 (SD = 1.29), whilst more than 17% of the children were either overweight (10.7%) or living with obesity (7.2%); comparable to national data from China for overweight and obesity in the same age group (20.4%) [21]. The mean utility scores of the total sample was, on average, slightly higher for CHU-9D-CHN using the UK tariff (mean = 0.937 [SD = 0.068]) compared to using the Chinese tariff (mean = 0.920 [SD = 0.094]) (Fig. 1). The mean total PedsQL score was 82.92 (SD = 11.21). Data on parental employment status was available for 1,539 children and is presented in Additional file 1: Appendix 2. Table 2 summarises the CHU-9D-CHN utility values and PedsQL total scores according to the weight status, gender of the children, and SES of the children's parents. The direction of the relationships was similar between instruments. Of interest, the mean utility scores using both UK and Chinese tariffs and mean PedsQL total scores were all marginally higher for children who were overweight/ obese compared to those who were not. These differences were not statistically significant. The CHU-9D-CHN reported a slightly higher mean utility score for girls compared to boys (p = 0.003 and p = 0.004 respectively) consistent with the mean PedsQL total score which was also higher in girls, although this difference was not statistically significant. Both HRQoL instruments reported scores that were marginally higher in children whose parents did not have a university education (lower SES) CHU-9D utility -Chinese Fig. 1 Distribution of the CHU-9D utility scores based on both British and Chinese tariffs compared to those who did but again, these differences were not statistically significant. The analyses were rerun using parental employment status as an alternative proxy for SES and the results were similar (Additional file 1: Appendix 3). The mean (SD) utility scores for children who had a PedsQL score that was less than or equal to the median value, compared to those with PedsQL scores greater than or equal to the median value were 0.909 (0.075) and 0.967 (0.043) respectively for the UK tariff; and 0.881 (0.106) and 0.961 (0.056) respectively for the Chinese tariff (p < 0.001).    Table 3 summarises the mean CHU-9D-CHN utility values across the dimension levels of the PedsQL. The majority of children reported themselves in good health, with the largest proportion reporting themselves at the highest level for all dimensions of the PedsQL. In general, the mean CHU-9D-CHN utility values corresponded well, decreasing linearly with increasing levels of severity on each dimension of the PedsQL for emotional and social functioning domains, and decreasing monotonically with increasing levels of severity on each dimension of the PedsQL for physical and school functioning domains (p < 0.001). This result was statistically significant (p < 0.001) for each of the dimensions. Figure 2 shows a scatter plot comparison of the relationship between the CHU-9D-CHN utility values (using UK tariff ) and the PedsQL total scores. Some anomalies were apparent. For instance, one child reported a high CHU-9D-CHN utility score of 0.963, yet had a low Ped-sQL total score of 34.78. However, in general, there was a moderate association between the instruments with higher CHU-9D-CHN utility values corresponding with higher PedsQL total scores and the CHU-9D-CHN utility values and PedsQL total scores converging towards the highest end of the scale. Figure 3 shows a scatter plot comparison of the relationship between the CHU-9D-CHN utility values (using Chinese tariff ) and the Ped-sQL total scores. Figure 2 is similar to Fig. 1 but some wider anomalies were apparent. For instance, one child  reported a high CHU-9D-CHN utility score of 0.996, yet had a low PedsQL total score of 34.78, and another child reported a low CHU-9D utility score of 0.535, yet had a high PedsQL total score of 82.60. However, in general, again there was a moderate association between the instruments with higher CHU-9D-CHN utility values corresponding with higher PedsQL total scores and the CHU-9D-CHN utility values and PedsQL total scores converging towards the highest end of the scale. Overall, the correlation between the CHU-9D-CHN utility values and PedsQL total scores showed a statistically significant moderate positive correlation for the UK tariff set (Spearman's rho = 0.5221, p < 0.001) and the Chinese tariff set (Spearman's rho = 0.5316, p < 0.001).

Convergent validity
The content and coverage of the two instruments were further compared by examining the correlation between each of the CHU-9D-CHN dimensions and the theoretically similar PedsQL domain functioning scores (Table 4). Using conventional cut-off values for Spearman's rho, each CHU-9D-CHN dimension was either weakly, or very weakly correlated with each of the predetermined PedsQL domain functioning scores. Since the CHU-9D-CHN dimensions were labelled with 1 as highest level and 5 as lowest level, the signs on the coefficients were consistently negative. All correlations were significant at the 0.01 level.
The paired comparison of the CHU-9D-CHN utility scores, using the UK and Chinese tariffs illustrated that the mean UK utility values (0.937, SD 0.068) were on average, marginally higher than the Chinese utility values (0.920, SD 0.094) and this difference was statistically significant (p < 0.001) (Fig. 1).

Statement of principal findings
With respect to known-group validity, contrary to studies conducted in Western countries [9,13,14], and although not statistically significant, we found an indication that HRQoL, using both the CHU-9D-CHN and the PedsQL, was higher in children whose parents had lower levels of education, compared to those whose parents were university educated. The CHU-9D-CHN demonstrated different scores according to the median PedsQL total score. For the discriminant validity, the mean CHU-9D-CHN utility values decreased linearly with increasing levels of severity on each dimension of the PedsQL for emotional and social functioning domains. They decreased monotonically with increasing levels of severity on each dimension of the PedsQL for physical and school functioning domains (p < 0.001).With respect to convergent validity, although there was a moderate significant positive correlation between CHU-9D-CHN utility values and PedsQL total scores, the correlation between individual CHU-9D-CHN dimensions and the theoretically similar PedsQL domains were weak or very weak. We also found the mean utility to be higher using the UK tariff-set in comparison to the Chinese tariff-set and this finding was expected given the underlying differences in valuation methodology and corresponding scale values.

Strengths and limitations of this study
Strengths include the large sample size (1539 children), diverse population (selected to include a range of socioeconomic backgrounds) and standardised data collection procedures as part of the randomised controlled trial. Furthermore, this study was one of the very few studies worldwide and the first study in China that collected utility-based HRQoL information in children as young as 6 years. It used both UK and Chinese tariffs for calculating the utility scores and reports on the psychometric properties of the CHU-9D-CHN in direct comparison to the widely used PedsQL instrument.
The study had some limitations. Data analysis was limited to data collected as part of the trial therefore the analysis was limited to an assessment of the CHU-9D-CHN validity in relation to the socio-demographic and economic variables collected within the trial and the PedsQL. However, there is no 'gold standard' instrument to assess construct validity in this context, and the PedsQL is a widely used HRQoL instrument validated for use with young children in diverse populations [15,16]. Although the CHU-9D has only been validated in children and adolescents from 7 to 18 years old, we have experience of using this in large studies with children as young as 6-years old [7,29]. Furthermore, as the only preference based HRQOL instrument that has been

CHU-9D utility (Chinese tariff)
Physical functioning designed exclusively with children for children, it was the most appropriate instrument to measure utility-based HRQOL at the time. Within the study, the CHU-9D was interviewer-administered because of the wide range of reading skills within the study population. This may have influenced the child responses, but we minimised this by using trained data collectors to interview participants individually in a private and familiar environment, away from other children and school staff. The interviewers were given age-appropriate communication skills training and read out the questions verbatim, providing clarification only when a child had language difficulties. Since the study was conducted, a new proxy version of the CHU-9D has been developed that is designed to be completed on behalf of children aged 5-7 years by an appropriate caregiver. Further research will determine if CHU-9D proxy-values are a more appropriate method for assessing HRQoL in this age group, instead of interviewer-administered CHU-9D self-assessed values. The evidence on whether proxy-reported values should be used for children is mixed but there does seem to be a consensus that where possible, self-report should be used, and this is especially the case for when a judgement is required on un-observable signs or symptoms [30]. In terms of further limitations, as there are cultural, infrastructural and other system-related differences between China and other countries, the generalisability of results to other contexts, particularly to developed country settings, could be questionable.

Comparison with other studies
Regarding the discriminant validity, some findings were in line with a previous study reported from a UK setting [9]. With respect to known-group validation, an interesting result was that, unlike a UK study in children aged 5-6 years [9], an Australian study in children aged 11-17 years [13], and a Danish study in high-school students [14], this study found no evidence of lower HRQoL in children from a lower socio-economic background-in fact the direction of effect was the reverse. This might be because the measures of SES are not equivalent in China and other countries. As a country in economic transition, educational level and employment may not reflect the same status as we see in the West. Also, as a communist country, SES measures may have less significance and no association with quality of life. The results of this study also differed from another study in a Chinese setting that reported a statistically significant trend for higher HRQoL scores (using PedsQL) in children who had parents with higher levels of education [21]. Two main differences were noted: in this study, all children were 6-7 years old (compared to 5-12 years old in the other study) and were from state schools, compared to the other study where 30% attended private schools for children of economic migrants. It is also worth noting that the study was conducted within a large urban city in China, where educational levels are generally higher, and a large proportion of parents reported being University educated.
For the convergent validity, the findings were similar to the previous studies in the UK and China [9,17]. The weak, or very weak correlation between the individual dimensions of each instrument might be because these individual dimensions describe something that is quite specific and different while appearing superficially similar. Also, perhaps there are overlaps between elements in some domains/dimensions which are resulting in the weak correlations, whilst the overall scores are better correlated.

Conclusions
Overall, the findings provide some support for the construct validity of the CHU-9D-CHN within a Chinese population aged 6-7 years. This is because (1) the CHU-9D-CHN was sensitive to known differences determined by the PedsQL median score; (2) the mean CHU-9D-CHN utility values decreased linearly with increasing levels of severity on each dimension of the PedsQL for emotional and social functioning domains, and they decreased monotonically with increasing levels of severity on each dimension of the PedsQL for physical and school functioning domains (p < 0.001); and (3) there was a moderate significant positive correlation between CHU-9D-CHN utility values and PedsQL total scores. However, there still remains areas of uncertainty as the CHU-9D-CHN dimensions were only weakly correlated with theoretically similar PedsQL dimensions and it is unclear why this was the case.
Overall we recommend future studies continue to test the validity of the CHU-9D in China and in other countries sharing similar cultures or SES-profiles to China. This is important because the measure may have different construct validity in different populations which might affect the results of health economic evaluations.